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Purpose 


A 

This paper documents the production deployment for NIEMS on Friday 8 June 2018 and its 
subsequent roll back. It summarises the actions that were completed and identifi^^psons learned. 

Background 






It is a 17/18 SPE milestone to launch the National Incident and Event Management System in 
Wellington and Christchurch. 

Project Progress 17/18 

Following a project reset, a high level roadmap out tq 30 June 2018 was developed with the TOCs 
and agreed by the NIEMS PSG in Nov 2017 then endorsed by the TGG in Dec 2017. See Appendix 1. 
This roadmap was designed to deliver a first iteration towards the provision of consistent processes, 
capability and tools to manage incidents and planned events nationally and is considered the first 
feature set to be delivered under the transport Operating System. 



A more detailed roadmap was refined early April 2018 in conjunction with the TOCs and shared with 
the NIEMS PSG. This was further refined in May 2018. See Appendix 2 and 3. 


"3 



Progress against the roadmap was reported on through the monthly Status Report to the NIEMS PSG 
and the CIS portfolio reporflo TGG as well as in quarterly SLT reporting. 

The project has been tracking to deliver against the 17/18 SPE milestone up until May 2018 where 
recurring delays through the change process increased the risk of delivery timelines being impacted. 


Production Deployment Planning 


As part of preparing to move ILS production to the cloud, a security assurance report on the Google 
^q|r0 platform was completed and reviewed by the NIEMS PSG in Feb 2018. 

Go live dates were discussed and agreed with WTOC Manager. This date was delayed several times 
due to a combination of Fujitsu readiness for servers and extreme weather. The programme was 
initially targeting a go live date of 10 May 2018 and finally firmed up as 8 June 2018 for deployment. 

The TOC Operational Checklist Acceptance Criteria template was completed to prepare for 
deployment of production into the cloud. This includes sign off for relevant functions as follows: 


• Design 








• Security 

• Privacy 

• Test Planning and Reporting 

• Early Life Support 

• Support Operate Phase 

• Release Notes 

• Disaster Recovery 

• Business Sign Off 

• Project Management 

Training was not required as this release did not change the operator functionality, it was agreed for 
the Government Chief Digital Office (GCDO) cloud assessment process for the Google Cloud Platform 
to be progressed in parallel to this process. 


jv 

itsu from 


The NZTA Fujitsu Change Management Request for Change Form was submitted to Fuj 
which a Fujitsu work request was generated and included in their weekly change plan, 

The approval to migrate the ILS database and enable ILS production to be deployed in the cioud was 
given as an exception to the historical change process but supported by the completion of these 
standard change forms. 




Deployment 

Friday 8 June 1-5 AM 


The Project Manager, 2 Developers and the shift Team Leader completed the change in conjunction 
with the Fujitsu DBA on shift. 

An implementation test exit report was completed documenting the process. See Appendix 4, 

The process was completed at 4:30 AM and email notification advising the business of the successful 
data migration was sent at 4,42 AM 

y<^ 

Post Deployment 


squest 


Four service desk requests were togged mid-iate afternoon on S June after the deployment. These 
related to user login issues and duplicate or triplicate log entries. 

The user login issues related to the ITS active directory and were ail resolved by 4:30 PM 8 June. 
Duplicate or triplicate log entries were occurring intermittently when populated from TRIES into 
N1EMS. The duplication issue was confirmed resolved by 7:00 PM 8 June. 

vV' 

Further intermittent problems presented as saving issues over the course of the weekend and a new 
support ticket was raised Sunday 19:30 PM 10 June. The ticket was mis-categorised by Fujitsu as a 
P3, Le, fix in business hours. This was not escalated to a P2 until Monday at 8:00 AM 11 June after a 
discussion with Fujitsu, 


Five more issues were raised on 11 June for problems presenting as: 

• Details not saving or saving in the wrong event 

• Event initiator unable to reopen event in TREIS 

• Overall system slowness 





These have been categorised as symptoms of the same underlying problem. 


Early analysis suggested that the problems may have been due to a database configuration setting. 
As these are adjustable, some changes were made to try to resolve the issue. The problems were 
intermittent in their nature and configuration changes did not resolve the underlying problem. 


As part of the problem investigation the developer tried reducing the number of servers but this 
caused the SCATS team to lose access to ILS for a time. With a severe weather warning in place for 
the next 24 hours, alternative options were considered to resolve the issues. 

The decision was taken at 3:00 PM, in conjunction with WTOCto implement a temporary roll ba 
and allow time to find a solution to the problem, whilst providing a stable solution for the op 
Open tickets were manually re-entered back into the old ILS. 

Problem Resolution 



The team completed an investigation into the user problems to identify and resolve the underlying 
cause of poor response times, page submit errors and system performance. 

The cause was determined to be from the original ILS code using a database access library 
(Hibernate) at one version and the newer NIEMS release using a later version of the library which 
had some subtle changes in the management of connections to the database. As a result when 
usage reached a certain volume level connections would stall; to become unavailable leading to the 
intermittent faults. 

On Monday 11 June, the team were able to reprc Ddudfe the errors, first on the staging server and 
later on local development systems. Code change^were made to resolve the problem. Performance 
load and stress tests were run and passed. Performance tests for the cloud deployment were also 
reproduced locally at WTOC and passed. 


The following changes have bee 





Update the ILS code to < 
library 

Improve the efficiency-of the old ILS code by 
optimising access to the database to reduce the 
number of redundant calls in normal usage 
Optimis^ome of the new NIEMS code 
Increase database instance size 


ve ILS notification mechanism so that they 
egional based _ 


ate JMeter perfonnance test 


Impact 


Resolve the issue of database connection pools 
being exhausted quickly 
Reduces overall stress on the system and will 
improve overall response times and server load 

Reduce query times 

Increase the maximum number of connections to 
database 

Events updated in Auckland do not impact on the 
server for Wellington users 
Stress test so that we know what the upper limit of 
the syste m's perf onnance is 


Retrospective 


A lessons learned meeting was undertaken with the NIEMS project team on Tuesday 12 June 2018 to 
understand what happened, why and identify opportunities to improve. The following observations 
and recommendations were documented: 













The Fujitsu support desk was unclear ort the 
escalation rules and sign offs for the process 
were unclear 

A new release of software with fundamental 
changes to underlying components was not 
bedded down with users for a long enough 
period 


<< N 


G 


Observation 

Recommendation 

Application was tested manually by users at the 
time of release and through automated testing 
but did not undergo sufficient load testing with 
multiple users 

Automated parallel and high volume load 
testing with prior quality test coverage is 
implemented and added to the Definition of 
Done 

A detailed Implementation Plan was done 
several weeks in advance but the Fujitsu staff 
implementing the plan were unfamiliar with it 
at the time of executing the plan 

"Full Dress Rehearsal" with all parties involved 
prior to D-Day 

.0 

Multiple cross communication due to key staff 
away sick 

Recommend clear lines of communication, 
escalation and accountability be published 

_ ^ _ 

Interruptions to staff working on incident 
response 

Dedicated staff working on the problem should 
not be interrupted to give regular updates. 

Setup an incident management group chat to 
manage communications within the team and 
nominate a spokesperson to interact with other 
stakeholders 

Changes directly in production affected live 

users 

No change be done in Production without 
going through stage testing 


off by c 




upport process be published and signed 
by ail parties prior to D-Day 


Designated staff be on hand with the 
customers for the initial few days especially if 
this is over a weekend. 




Issue Resolution and Performance Testing 

The underlying problem that manifested as poor response times, page submit errors and other 
problems was investigated, identified and corrected. 


ilem was caused by the original ILS code using a database access library (Hibernate) at one 
an and the newer NiEMS release using a later version of the library which had some subtle 
granges in the management of connections to the database, As a result when usage reached a 
certain volume level connections would start to become unavailable leading to the intermittent 
faults. 


The errors were reproduced, first on the staging server and later on local development systems first 
through manual load test and later through an a utomated test. 

































A JMeter performance test was created in order to stress test the system. This test emulates a 
number of users constantly using the NIEMS Ul and creating incidents. The numbers are increased 
until the system starts to show an error or increasing response times. 
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Redeployment 

Detailed Planning Update 


& 




& 


A detailed plan to redeploy ILS to the cloud was revised and agreed with the business on Thursday 
21 June 2018 with a new target date of Saturday 23rd June 2018 subject to weather* road works and 
no major incidents. See 0618.3.0 NIEMS Releases Detailed Plan. j'v 

The plan included a higher level of on-site support for the first days following the release. 

- - 3 **- 


off and a final go-live was agreed to at 6pm 22nd. 


Redeployment 
Saturday 23 June 7-11 PM 


The Manager Transport Operating System and Business Analyst were on site in Wellington for the 
deployment. Along with the Project ManagefjSyDevelopers and the shift Team Leader they 
completed the change in conjunction with the Fujitsu DBA on shift. 

An implementation test exit report was completed documenting the process. See Appendix 5. The 
process was completed at 11 PM^pemail notification advising of the successful data migration 
was sent. 




£ 
4 ? 
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On-site support from the project team was maintained for 48 hours as per the rotating schedule and 
observations documented in the Post Release After Care Notes. 

No new problems were identified and operator's feedback was that they experienced increased 
performance from the new system. 




Appendix 1 

Approved High Level NIEMS Roadmap - November 2018 


National Incident and Event Management System (NIEMS) 


Project roadmap 

Draft NIEMS Road Map 
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Appendix 2 

Updated NIEMS Roadmap - 6 April 2018 
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Appendix 3 

Updated NIEMS Roadmap - April 2Qpi 
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Appendix 4 

Implementation Test Exit Report - 8 June 2018 


Implementation 

Steps: 

Task 

n 

Details 

Implementer(s) 

Date/Time 


1 

Create full database backup from ILS 

SQL Server Database PROD 

Fujitsu DBA 

08/06/2018. ) 
01:01 


2 

Zip the backup and encrypted with a 
password fThe password must be at 
, 'least 10 characters long and should 
j! contains symbols and alphanumeric. 

Fujitsu DBA 

08/06/2018 


3 

Link to internal NZTA folder 

1 

Fujitsu DBA 

08/06/2018 

02:10 


4 

S9(2)(a) 

Fujitsu DBA 

08/06/2018 

02:15 


5 

\ 

Convert SQL Server dump to Postgres 

SQL dump v 

rv 

S9(2)(a) | 

08/06/2018 



(NZTA) 

02:40 


6 

Make Postgres SQL dump to be 
compatibj^yvfth ^CP Postgres 

|S9(2)(a) j 

08/06/2018 



(NZTA) 

02:50 


7 

( y 

Upload Postgres dump to GCP 

S9(2)(a) 

08/06/2018 



(NZTA) 

02:55 


/ 

8 

\ 

Apply Postgres dump to NIEMS 

S9(2)(a) 

08/06/2018 


production database 

[NZTA] 

03:00 

4 

^ 9 

Release/Start NIEMS application prod 

S9(2)(a) 1 

08/06/2018 




(NZTA) 

03:05 


^ Verification Plan: 

Task 

U 

Details 

Implementer(s) 

Date/Time 










































(High level test 
steps to qualify 
the change 
outcome with 
supporting doc 
emailed to 
Change team if 
required; Include 
Technical and 
User Testing. 





<? 

& 


<3 


,v 






Application Checks: 

a) Check application is up 
and running 

b) Check all pages load 
successfully 

c) Compare what shown 
on the page to what's 
in the database 

d) Create a test 
event/incident to 
ensure it works. 

e) Run reports 

f) Check Travel Times (to 
ensure connectivity to 
TIM) 

Login to Corporate 

r\j\ra >*m 





08/06/2018 

03:05 


A 


.O) 


Database checks: 

a) Check NIEMS Pol 
DB table have exactly 
same number of rows 
as I LS SQL Server 

b) Check sequences are 
correctly set 

c) Check 'TEXT' datatype 
columns are correctly 
populated 
Check Geometry 
datatype columns are 
correctly populated 


a* 


Business Test: 

WTOC to test NIEMS application as 
per the embedded test plan. 


S9(2)(a) 


(NZTA) 


WTOC (NZTA) 


08/06/2018 

03:10 


08/06/2018 

03:10 


Go/No-Go decision 


If GO, notifies WTOC to use NIEMS 
application 



08/06/2018 

04:25 


08/06/2018 

04:30 



























Appendix 5 

Implementation Test Exit Report - 23 June 2018 



Test 

Pass 

Fail 

Comments 

1 

Log in - Chrome, time taken to load, same 
or better as current ILS 

X 


Same timing 

2 

Log in - Firefox, time taken to load, same or 
better as current ILS 

X 


Same timing 

3 

Log in - IE, time taken to load, same or 
better as current ILS 

X 

X 

xF 

V Mow works in IE 

4 

Historical data available from February 

2018 in all tabs and able to search in 
current, recent and history 

v,d 


108 extra events in 
the system from 
previous rollout 

5 

Historical data available from July 2017 in 
all tabs and able to search in current, recent 
and history 

X 



6 

A c> r 

TREiS roadworks displayed 

X 



7 

TREIS roadworks data description, 
comment details, location, TREIS ref# 
displayed 

X 


Checked 5 events, 
all showing correct 
information 

8 

/Or 

TREIS incidents displayed 

X 


Checked 5 events, 
all showing correct 
information 

9 

TREIS incidents data description, location, 
source, start time, notification time, 
incident level, TREIS ref # displayed 

X 




Open current fault from the fault tab, 
search current fault in this tab 

X 



y 

ii 

Open recent fault from the fault tab, search 
recent fault in this tab 

X 



12 

Open fault history from the fault tab, 
search within fault history 

X 


3 extra events in the 
previous system 
from previous 
rollout 






















13 

Pop up menu works in Fault tab 

X 



14 

Faults: Dropdown fields are populated and 
work, able to select all options, able to 
charsge/ciear selection 

X 



15 

Open current SCATS from the SCATS tab, 
and search for current SCATs event 

X 


<; 

16 

Open recent SCATS from the SCATS tab, and 
search for recent SCATs event 

X 



17 

Open SCATS history from the SCATS tab,, 
and search for SCATs event from history 

X 

> 

33 extra events in 
the previous system 
from previous 

18 

Open travel times tab and display 
information 

X 

£ 

v Journey times not 
displaying but the 
signs are off 

19 

Overall speed (faster than current ILS), able 
to move between tabs and events with 
ease {no delay} 

/ 



20 

Create new closure, able to complete all 
fields AV 

/ 

X 


Slight delay in 
creating, less than 5 
seconds 

21 

rjT 

Progress Closure with "Sign Out T button 

A/r 

X 



22 

"Sign out" added to timeline timestamp 
accurate, timestamp^edited 

X 



23 

"Sign out" timeline timestamp editable 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 


J^/rogress Closure with "Closure In" button 

X 



25 

/ 

"Closure In" added to timeline timestamp 
accurate 

X 



26 

Closure timeline timestamp editable 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 

27 

Progress Closure with "Police Advised" 
button 

X 































28 

"Police Advised" added to timeline 
timestamp accurate 

X 



29 

"Police Advised" timeline timestamp edited 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 

30 

Progress Closure with "Reopened" button 

X 


4 

31 

"Reopened" added to timeline timestamp 
accurate 

X 


* 

32 

"Reopened" timeline timestamp editable 

X 


Change^ to times 
not captured in 
((upeline - same as 

US testing 

33 

Progress Closure with "Off Network" button 

X 

- 



34 

"Off Network" added to timeline timestamp 
accurate 

? 



35 

Xr 

"Off Network" timeline timestamp ^affable 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 

36 

o° X 

x# 

Create a new incident, able to complete all 
fields 

X 


All sections tested, 
map tested, timeline 
checked, all buttons 
checked all ok - Only 
once did the field 
self-clear 

37 

Progress incident using "All Lanes Open" 
button 

X 



* 

/^ll lanes open" added to timeline 
timestamp accurate 

X 



f 

39 

"All lanes open" timeline timestamp 
editable 

X 


Changes to times 
not captured in 
timeline-same as 

ILS testing 

40 

Progress incident using "Traffic Normal" 
button 

X 


Timings much better 
than previous 
version, picked up in 

3 seconds 




V 




























41 

"Traffic Normal" added to timeline, 
timestamp accurate 

X 



42 

"Traffic Normal" timeline timestamp 
editable 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 

43 

Progress incident using "Close" button 

X 


,d 

44 

"Close" added to timeline, timestamp 
accurate 

X 



45 

"Close" timeline timestamp editable 

X 



46 

Create new roadwork, able to complete ail 
fields 

Cl 


STMS will not 
populate but this 
section not used by 
Wellington, ATOC 
has a list that 
populates for them 
in this 

47 

Progress roadwork using "Roadworks in" \ 
button 




48 

"Roadworks in" added to timelin^ 5 
timestamp accurate 

X 



49 

✓ ° 

"Roadworks in" timeline timestamp 
editable 

X 


Changes to times 
not captured in 
timeline - same as 

ILS testing 

50 

Progress incident using "Off network" 
button 

X 



51 

"Off network" added to timeiine, 
timestamp accurate 

X 



£ 

y 

"Off network' 1 timeline timestamp editable 

X 


Changes to times 
not captured in 
timeiine ■ same as 

ILS testing 

53 

Open Detail Report (All Incidents) from the 
reports tab 

X 


Timeline showing 
out of sequence 

54 

Open Detail Report (Level 3 and Above) 
from the reports tab 

X 


Time captures are 
showing the time 
down to the 
millisecond 






























55 

Open Summary Report (All Incidents) from 
the reports tab 

X 



56 

Open Monthly Summary Report (Events) 
from the reports tab 

X 



57 

Open Duration Report (Incidents) from the 
reports tab 

X 



58 

Open Roadworks Summary from the 
reports tab 

X 



59 

Open Closure Summary from the reports 
tab 

X 



60 

Open Timings Report (Incidents) from the 
reports tab 

X 

/ 

sO* 

61 

Open Incident Duration Report Enhanced 
from the reports tab 

X 

A 


62 

Open Summary Report (Level 3 and above) 
from the reports tab 




63 

Open Summary Report (Level 2 and above) 
from the reports tab 

X 



64 

Open Incident Report (SCATS) from the 
reports tab 

X 



65 

Open Incident Duration Report Contractor 
from the reports tab 

X 



66 

Open Incident Duration Report New from 
the reports tab 

X 



67 

Open Month Summary Report (Events) By 
Region from the reports tab 

X 


Order changed but 
not badly 

68 

rj^v 

Open General Summary Report from the 
reports tab 

X 




Open Fault Summary Report from the 
reports tab 

X 



70 

Open Twitter Summary Report from the 
reports tab 

X 


Checkedbv 

S9(2)(a) 

71 

Open TAR Summary Report from the 
reports tab 

X 


Chpcked bv 

S9(2)(a) 

72 

Generate a Detail Report (All Incidents) 
using the date range fields 

X 









































73 

Generate a Detail Report (Level 3 and 

Above) using the date range fields 

X 



74 

Generate a Summary Report (All Incidents) 
using the date range fields 

X 



75 

Generate a Monthly Summary Report 
(Events) using the date range fields 

X 



76 

Generate a Duration Report (Incidents) 
using the date range fields 

X 


..A 

77 

Generate a Roadworks Summary using the 
date range fields 

X 



78 

Generate a Closure Summary using the date 
range fields 

X 



79 

Generate a Timings Report (Incidents) using 
the date range fields 

X 



80 

Generate a Incident Duration Report 
Enhanced using the date range fields 




81 

Generate a Summary Report (Level 3 and 
above) using the date range fields 

# 

/ 



82 

Generate a Summary Report (Level 2 and 
above) using the date range field^^ 

X 



83 

Generate a Incident Report (SCATS) using 
the date range fields 

f - " ' ' -^■iS s T m Ti--a&- - ■ i- 

X 



84 

Generate a Incident Duration Report 
Enhanced using the date range fields 

X 



85 

Generate a Incident Duration Report New 
using thetf^yfange fields 

X 



86 < 

GendlXira Month Summary Report 
jEvehtl) By Region using the date range 

X 



G 

s/ 

Generate a General Summary Report using 
the date range fields 

X 



V 

88 

Generate a Fault Summary Report using the 
date range fields 

X 



89 

Generate a Twitter Summary Report using 
the date range fields 

X 

l 


90 

Generate a TAR Summary Report using the 
date range fields 

X 


1 Checked bvSSBISMB 

i 

' 9 2 a HHH 





































91 

Map opens when 1 click on map tab 

X 



92 

Manually plot event on map 

X 



93 

Search map using hot key (ALT +L) and plot 
on the map, able to change location by 
researching using ALT + L 

X 


Search now working 
in new system 

94 

Change locations using the search function 

X 



95 

Map speed 

X 


Minor lag, 
comparable to ILS 

96 

Speed of buttons updating fields in details 

X 


Much better than 
N^S^was prior to the 
^ upgrade 

97 

Clicking on the header in 
incidents/roadworks/faults brings up the 
details in section down below 

/ 


Checked multiple 
tabs and multiple 
(5+) events and all 
populating as it 
should 

98 

r>' 

Make sure the notes when added to the 
event do not populate on another event 

X 


Added notes to 
three separate 
events - working 
fine 

99 

No duplicate TREIS events populating in ILS 

X 


TREIS event created 
at 22:57, populated 
in ILS at 22:59. no 
duplicate reported 

100 

Dummy event entered into TREIS and 
populates Fftfo ILS system 

X 


Same as above. 


cv 































